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PACS 71.27.+a - Strongly correlated electron systems; heavy fermions 
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Abstract. - We present a continuous-time Monte Carlo method for quantum impurity models, 
which combines a weak-coupling expansion with an auxiliary-field decomposition. The method is 
considerably more efficient than Hirsch-Fye and free of time discretization errors, and is particu- 
larly useful as impurity solver in large cluster dynamical mean field theory (DMFT) calculations. 



Introduction. — The development of efficient numer- 
ical methods for solving quantum impurity models has 
been driven in recent years by the success of dynami- 
cal mean field theory (DMFT) [1-3] and its extensions. 
DMFT is an approximate framework for the study of 
fcrmionic lattice models, which replaces the lattice by 
a quantum impurity embedded in a self-consistent bath. 
Both cluster-extensions of DMFT [3-7] and realistic elec- 
tronic structure calculations, which combine DMFT with 
band structure methods [3], involve multi-site or multi- 
orbital impurity models (e.g. for d- and /-electron sys- 
tems), whose solution is computationally expensive and 
in practice the bottleneck of the calculations. In order to 
facilitate progress in this field, it is therefore important to 
develop fast and accurate impurity solvers. 

Until recently, the Hirsch-Fye auxiliary field method [8] 
has been the only Quantum Monte Carlo impurity solver 
used in DMFT. It suffers from one major drawback: it re- 
quires a discretization of the imaginary time interval into 
a large number TV of time slices and therefore the calcula- 
tion of determinants of Nn s x Nn s matrices (for models 
with n s sites and on-site interactions only), which is com- 
putationally expensive. Furthermore, this discretization 
introduces a systematic error which needs to be dealt with 
(in principle) through tedious extrapolations N — > oo. 

Important progress was achieved recently with the de- 
velopment of continuous-time impurity solvers, which are 
based on the stochastic sampling of a diagrammatic ex- 



pansion of the partition function. These methods do not 
suffer from time discretization errors and allow the sim- 
ulation of models with more general interactions. The 
first continuous-time impurity solver was proposed by 
Rubtsov et al. [9], who expanded the partition function 
in the interaction terms and used Wick's theorem. An- 
other powerful and flexible diagrammatic solver for small 
impurity problems, based on a diagrammatic expansion 
in the impurity-bath hybridization, has been proposed in 
Refs. [10-12]. Since this method perturbs around an ex- 
actly solved atomic limit, it is particularly efficient at mod- 
erate and strong interactions [13]. While the sign problem 
in this algorithm is less severe than in Hirsch-Fye or in 
the weak-coupling continuous-time method, the computa- 
tional effort scales exponentially with the number of sites 
and orbitals, making it difficult or impossible to solve clus- 
ters with eight or more sites. As a consequence, Hirsch-Fye 
is currently still considered the method of choice for large 
cluster DMFT computations. 

In this paper, we present a new continuous-time impu- 
rity solver which combines a weak-coupling expansion with 
an auxiliary field decomposition, and which was inspired 
by the work of Rombouts et al. [14] for lattice models. 
Our method is formally similar to the Hirsch-Fye algo- 
rithm, but as a weak-coupling solver performs comparable 
to Rubtsov's method. 

Method. — We present the algorithm for the single- 
impurity model corresponding to the DMFT solution of 
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the one-band Hubbard model. The extension to multi- 
band or cluster models with density-density coupling is 
straightforward and will be briefly discussed at the end of 
this section. 

The partition function for the impurity model can be 
written as a path integral over Grassman variables £ and 
£*, Z = J ]e- s , with effective action 



S = fdrdr' £(r)\g£(T r')k CT (r' 
J ° <r=T4 

- i j I t \ t \ n t( T ) + n l( T ) 



function is of the form 



)- (D 



Here, n = and g^ is related to the "conventional" non- 
interacting Green's function of Ref. [1] by the expression 
g^^LUn) = — (So~conv(* a; ™) — U/2), which means that the 
chemical potential is shifted by —U/2 and go(r) > for 
< r < 13. 

In order to closely follow the standard derivation of the 
Hirsch-Fye algorithm (see e.g. Rcf. [1]). we switch to the 
Hamiltonian formulation 

H = Hq + V, (2a) 
H = -(fi-U/2)(n r + n L ) 

(2b) 



V 



U(n ] n l - (n t + n{)/2), 



(2c) 



where Hq is the Gaussian term containing both the im- 
purity (c) and the bath (a) degrees of freedom. Following 
Rombouts et al. [14] we introduce a constant K, express 
the partition function in an interaction representation, 

Z = Tre-P* = e - K Tt\e-^T T e- ^(V{r)-K/p)\ (3) 

and expand the time ordered exponential in powers of 
K//3 — V (dropping the irrelevant factor e~ K ): 



z = J2 / d n . 



n>0 
X 



-.(f)"- 



l-^)...e-(--^(l-^) e -^ 



(4) 



We then decouple the interaction terms as follows [14]: 



pv = i v 

K 2 <H 



ol s i. r 



s=-l,l 

cosh( 7 ) = 1 + [pU)/(2K] 



(5a) 
(5b) 



Expressions (5a) and (5b) are valid for arbitrary (complex) 
parameters K. If K > 0, 7 is real and the expansion 
parameter is positive. After the decoupling, the partition 



Z =y^y2 / d Tl ... drj—) Z„({s i ,T J }), 

~ l<i<n 

(6a) 

1 

Z n ({si, n}) = Tr cxp(-AT ? i7 ) exp(sj7(n T - nj), 

i—n 

(6b) 

with At{ = Tj+i — T{ for i < n and Ar„ = 0— T n +Ti. ^„ is 
very similar to the expression for the partition function in 
the Hirsch-Fye algorithm after the Trotter approximation, 
see for example Eq. (117) of Ref. [1], except that the time 
arguments of the auxiliary spins Sj are not regularly spaced 
on [0, 0\. Indeed, one can straightforwardly generalize the 
calculation in Ref. [1] to rewrite Z n /Z Q (Z = Tre"^ ) 
as: 



Z n {{si,n}) 
Z a 

iV-^KrJ) 



JJ det N-'ds^n}), (7) 
*=T4 



G& } (e"-"-l), (8) 
diag(e 7( - irsi ,...,e^(- 1)CTs "), (9) 



with the notations (— ly = 1, (— l)^ = —1 and 

( G o? } )i,j = 30a {n - Tj) for i^j, (Gll i} ) iti = g O a(0 + )- 

While we tried to emphasize in our derivation the sim- 
ilarities to the Hirsch-Fye algorithm, let us note at this 
point also the essential differences between Hirsch-Fye and 
our continuous-time auxiliary field method (CT-AUX): i) 
CT-AUX is based on a weak-coupling expansion, not a 
Suzuki- Trotter decomposition of the partition function; ii) 
the auxiliary fields in CT-AUX originate from Rombout's 
decoupling formula (5a). In particular, CT-AUX does not 
require any time discretization. The number and position 
of auxiliary spins on the imaginary time interval is arbi- 
trary and changes constantly during the simulation. 

The formulae above are easily generalized for cluster and 
multiorbital DMFT problems with density-density inter- 
actions by performing a similar expansion for all the in- 
teraction terms. For clusters of size n s with local density- 
density interaction U (relevant e.g. for cluster DMFT ap- 
proximations of the Hubbard model), expression (5b) for 
7 remains unchanged and other formulas, like Eq. (7), can 
be generalized straightforwardly by replacing the time and 
spin indices by (time, site) and (spin, site) multi-indices 
respectively. The multiplicative factor dropped from the 
partition function is exp(— Kn s ) in this case. 

Sampling procedure. Our algorithm samples time or- 
dered configurations consisting of spins s\, . . . , s n at times 
n < T2 < . . . < T n with weight 



w({Si,Ti}) 
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For ergodicity it is sufficient to insert /remove spins with 
random orientation at random times. 

The detailed balance condition can be implemented as 
follows. Assuming that we pick a random time in the 
interval [0, 0) and a random direction for this new spin 
(p prop (n — > n+ 1) = (1/2){cLt/ (3)), and propose to remove 
it with probability p pmp {n +1 — > n) = l/(n + 1), we get 



p(n 



1) 



p(n + 1 — > n) n + 



K_ n det(M" +1 V 1 



(11) 



The matrices N a = (e v ° - G 0a (e Va - -0) -1 are stored 
and manipulated using fast update formulas analogous to 
those of Refs. [9, 10]. When inserting a spin we add a 
new row and column to N^ 1 . Following the notation of 
Ref. [15], we define the blocks (omitting the a index until 
the end of this section) 



(iV (n+1) )- 



(iV(™))" 
R 



N 



(n+l) 



P Q 
R S 



(12) 



where Q, R, S denote (1 x n), (n x 1), (1 x 1) matri- 
ces, respectively, which contain the contribution of the 
added spin. The determinant ratio needed for the accep- 
tance/rejection probability is then given by 



det(A^("+ 1 ))- 1 
det^H)- 1 



1 



detS 



S-R(nW)Q. (13) 



As we store N^ n \ computing the acceptance/rejection 
probability of an insertion move involves a matrix-vector 
multiplication followed by an inner product, i.e. an 0(n 2 ) 
operation. If a move is accepted, a rank one update 
is performed to compute the new matrix 7V ( ™ +1 ' ) out of 
N^ n \Q,R, and S: 



S={S-R{N {n) Q])-\ 

Q = -[N^Q]S, 

R=-S[RN^}, 

P = + [nWQ]S[RNW] 



(14a) 
(14b) 
(14c) 
(14d) 



Measurement of the Green's function. The main ob- 
servable of interest in the simulations is the Green's func- 
tion 3ct(t, t'). First, let us note from (6) that one can add 
two additional "non-interacting" spins s = s' = at any 
fixed times r and r' (we denote with a tilde the corre- 
sponding matrices of size n + 2). Zg a (r,T') is then given 
by an expression similar to Eqs. (6), with an insertion of 
c(t) and c^(r') at the corresponding times. We can again 
use the standard Hirsch-Fyc formula for the discretized 
Green function (Eq. (118) of Ref. [1]) to obtain 



s.(ry)4E 



K 



n>0 



x Z n ({ Sl ,T,})G(^ T -)(r,r'), 



(15) 



with Gl s " Tl} = Nrdsun^GZ'*. Since s = s' = 0, a 
simple block calculation leads to 

Gl^}(T,T')= 90a (T,T') 



^2 90a{T,T k ) ( 



An} 



l)N a {{s u n}) 5 0a(r ; ,r') 



k,l=l 



kl 



(16) 



In order to compute the Green's function, one can not 
just accumulate its values at the discrete times Tj of the 
auxiliary spins, since the {r^} are correlated. Rather, the 
Green's function is accumulated using Eqs. (15) and (16): 

9a(r) = go,(T) + f dfg 0a (T-f)(sj Si ' T *Hf)), (17) 



S0^}(f)=££(f-r fe )£M ( 
fc=i 



^goAn), (18) 



1=1 



M, 



{si,Ti} = 

kl — 



An} 



[(e v ° ' -l)N a ({ Sl ,r l })} ) 



(19) 



where we have used translational invariancc, set r' = 0, 
and denoted the Monte Carlo average with angular brack- 
ets (our convention is g(r) > for < r < ff). Hence, we 
measure only the quantity (S'| s " r! ^(f )), which we bin into 
fine bins. After the simulation is completed, the Green's 
function is constructed using Eq. (17). 
Note that the Dyson equation 

g<j{iu n ) = ga a {iuJn) + s , o<r(jw„)S .(iw„)g .(ia;„) (20) 

implies that this procedure amounts to accumulating 
Ea-^cr. Besides the higher efficiency with respect to the 
direct accumulation of the Green's function, an important 
advantage of such a measurement is the reduction in high- 
frequency noise by the multiplication with go ~ l/cu n (see 
also Ref. [16] for similar ideas in the NRG-DMFT context). 

Let us emphasize that the same procedure can also be 
employed in the weak coupling algorithm, where it yields 
significant performance gains over the methods described 
in Refs. [9] and [13], especially for large clusters. 

Four point functions. Four point correlation functions 
can also be computed in a similar way as in Hirsch-Fye 
using the fact that for a fixed auxiliary spin configuration 
the problem is Gaussian and Wick's theorem can therefore 
be used together with Eq. (16). Thus the problem reduces 
to the accumulation of the determinant of a 2 x 2 matrix 



(9l 2 
(So 32 



9l k M kl 

3k M { 



9%) 



9o 



ki 



i} J2 



9 l o 2 ) 



(9I 4 

(5o 34 



9l k M kl 



{S„T,} 

kl 



defined in Eq. (19). 



l 9 l o A ) \ 

(21) 

If only a few corrcla- 



with AI t 

tion functions are measured, Eq. (21) is best evaluated 
directly during the simulation. If many or all correlation 
functions have to be measured at n T time points and the 
size riM of M is comparatively small, it is advantageous 



to accumulate only (M, 



and (M, 
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Fig. 1: Average perturbation order for the continuous-time 
auxiliary field algorithm (K — 1) and the weak-coupling algo- 
rithm (with a = 0.01). Single-site Hubbard model, half-filling, 
semi-circular density of states of bandwidth At, and (it = 30. 
Inset: Expansion order (matrix size) as a function of K. Single 
site Hubbard model, half-filling, semicircular density of states 
of bandwidth At, U/t = 4, and (it = 10. 



reconstruct the correlation function at the end of the com- 
putation. Indeed, while binning the latter expression is 
O(n^) in memory, it is only 0{n 3 M ) computationally (us- 
ing time translation invariance). 

Role of the expansion parameter K . The average per- 
turbation order (n c t a ux) is related to the parameter K, 
potential energy and filling by 

(n c taux) = K-[3U fani - (n T + ni)/2). (22) 

This expression is obtained by applying the operator 
^9k\u/ k to InZ both in its original form (3) and to 
(6), including the factor e~ K dropped after Eq. (3) (see 
also Rcf. [14]). In the case of the weak-coupling algo- 
rithm [9], (n wc ) Q ^ = -/3C/(«T n l ~ ( n T + n l)/2), where 
a is the small parameter which must be introduced to re- 
duce the sign problem. Hence, the perturbation order in 
the continuous-time auxiliary field method grows linearly 
with K (see inset of Fig. 1) and (n ctaux )^_o = (n wc ) a ->o- 

Figure 1 shows the perturbation orders for the two 
methods as a function of U. For these small values of 
K and a, the perturbation orders are essentially identi- 
cal. Both weak-coupling methods scale roughly linearly 
with U, with a kink visible at the Mott critical value. It 
also follows from Eq. (22) that the perturbation order is 
essentially linear in the inverse temperature (3. 

Similar to the weak-coupling expansion parameter a [9] , 
the parameter K can be freely adjusted. While a larger K 
yields a larger expansion order, it also reduces the value 
of 7 (see Eq. (5)). This makes it easier to flip auxiliary 
spins. Therefore the auxiliary spins have less tendency to 
polarize for larger K . In practice, however, .fT-values of 
order 1 turned out to be adequate. Although we found 
that the sign problem improves slightly with larger K, 
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Fig. 2: Upper panel: real-space Green's functions (onsite, 
nearest-neighbor and next-nearest neighbor) for the four site 
cluster with nearest neighbor hopping t, U/t = 4, filling = 
0.9, (it — 2.5. Hirsch Fye results with 40 time slices are repre- 
sented by the symbols, the weak coupling and CT-AUX results 
by lines (on top of each other). Lower panel: real space Green's 
functions for the eight-site cluster obtained in 8 CPU hours us- 
ing CT-AUX and Hirsch-Fye. 80 time slices were considered in 
the Hirsch-Fye simulation. Note the fact that the Hirsch-Fye 
result is slightly spin-polarized. 



this small gain is more than compensated by the increase 
in computational cost at larger values of K . 

Comparison with other QMC methods. To 

compare to other methods we implemented single-site, as 
well as 4 and 8 site cluster calculations in the dynamical 
cluster approximation (DCA) [4,17], and expect similar 
results for other cluster schemes such as cellular dynam- 
ical mean field theory (CDMFT) [7]. The upper panel 
of Fig. 2 shows a typical real space cluster Green's func- 
tion for a 4-site DCA calculation. The CT-AUX results 
are identical to the other QMC results, showing the ac- 
curacy of the new approach. The lower panel of Fig. 2 
shows Green's functions for an 8-site cluster (/? = 20, t = 
0.25, U = 2, n — —0.3757) obtained from a converged go 
in 8 CPU hours on a 1.6 GHz Opteron 244. Symbols show 
the result for Hirsch-Fye with 80 time slices, and the lines 
indicate the result from CT-AUX, measured at 500 time 



p-4 



Continuous-time auxiliary field Monte Carlo for quantum impurity models 



300 



250- 



9 200- 

Oh 

X 

W 150- 



£3 ioo 



X 



>CT AUX 

i Hirsch Fye nfSU 



so- 



io 15 20 25 

Pt 





-0.2 
-0.4 



• • HF, Ax = 1 

HF, Ax = 0.5 
•-• HF, Ax = 0.25 




0.5 



CO 



1.5 



2.5 



Fig. 3: Expansion order as a function of /3 for the four site clus- 
ter with nearest neighbor hopping, U = 2, fj, = —0.3757, t = 
0.25. For Hirsch-Fye, a reasonable compromise between accu- 
racy and speed would require at least iV = f3Un s time slices, 
which leads to larger matrices whose determinants need to be 
updated. 



Fig. 4: Imaginary part of the self-energy for the DMFT solution 
of the single site Hubbard model. CT-AUX, Hirsch-Fye using 
32, 64 and 128 auxiliary spins (time slices), ED with 6 bath 
sites (/3 = 32, U = 3). Hirsch-Fye and ED results were taken 
from Fig. 15 of Ref. [1]. For CT-AUX, the average number of 
auxiliary spins is (n) = 42.5. 



points. 

As a continuous time method CT-AUX has a definite 
advantage over Hirsch-Fye QMC, since it removes the ne- 
cessity of careful extrapolations to the continuous time 
limit. However, in order to be a useful replacement for 
Hirsch-Fye in practice, CT-AUX has to satisfy two re- 
quirements: i) The average expansion order (n), which 
determines the complexity of the calculation (0({n) )), 
has to be smaller than the number of times slices required 
in Hirsch-Fye (close to the continuous limit, where extrap- 
olation is meaningful); ii) The sign problem must not be 
worse than in previous algorithms. 

Expansion order. First we compare the expansion or- 
der to Hirsch-Fye for a high temperature 2x2 DCA calcu- 
lation in Fig. 3. In Hirsch-Fye the number of time slices 
was fixed a priori using the optimistic criterion N = (3Un s 
which corresponds to AtU = 1, where At is the size of 
the time slices - just barely in the region of validity of the 
Trotter approximation underlying the Hirsch-Fye method. 
CT-AUX with its roughly two times lower average pertur- 
bation order is much more efficient since both algorithms 
scale like the cube of the matrix size. In the 8-site cluster 
simulation of Fig. 2, the Hirsch-Fye algorithm with 80 time 
slices had to update matrices of size 640, while CT-AUX 
merely had to operate on matrices of average size 136. 
This means that CT-AUX allows to reach substantially 
lower temperatures, even with modest computational re- 
sources. 

To make the comparison with Hirsch-Fye more precise 
and get rid of the arbitrariness of the choice of the number 
of times slices in Hirsch-Fye we have reproduced in Fig. 4 
the self-energy calculation presented in Fig. 15 of Ref. [1], 
where the same single-site calculation was performed with 
Hirsch-Fye (32, 64 and 128 time slices) and with exact di- 



agonalization (ED). Even with 128 times slices, the Hirsch- 
Fye results still have substantial discretization errors while 
CT-AUX produces the exact result (comparable to the 
"bath = 6 ED result) with only (n) = 42.5 spins. This 
shows that CT-AUX is indeed much more efficient than 
the Hirsch-Fye method: not only does it compute the nu- 
merically exact result directly, but it does so using signifi- 
cantly less auxiliary spins. This is due to the fact that CT- 
AUX, like Rubtsov's method, is based on a weak-coupling 
expansion (see Fig. 1 and Ref. [13] for a comparison of the 
weak-coupling method with Hirsch-Fye). 

Sign problem. For single site impurity models, there 
is no sign problem since the proof of Ref. [18] can be ex- 
tended to CT-AUX. For cluster calculations, as U and f3 
is increased, the sign becomes smaller than one. How- 
ever, for the unfrustratcd plaqucttc at temperatures down 
to pt = 25 we did not observe a significant sign problem 
((sign) > 0.99). In order to produce a severe sign problem 
at high temperature, we frustrated our plaquette with a 
hopping t' along the diagonal. For the almost triangu- 
lar case t 1 = QM the Hirsch-Fye method, weak-coupling 
method and our solver exhibit a sign problem that be- 
comes increasingly severe as the interaction strength U is 
increased or the temperature T lowered. For U > 7t, the 
average sign is less than 0.2, as seen in Fig. 5, making it 
difficult to access temperatures below T = Q.lt. Remark- 
ably, the average signs in CT-AUX, Hirsch-Fye and the 
weak-coupling algorithm are almost identical. 

Since one of the likely applications of the CT-AUX 
method is the solution of large DMFT clusters (not ac- 
cessible to the hybridization expansion solver), we per- 
formed a similar study for an 8-site cluster, with a similar 
conclusion. The sign problem at a reference point on the 
eight site Bctts cluster (U = 2, t = 0.25, \i = -0.375, 



p-5 



Emanuel Gull 1 Philipp Werner 2 Olivier Parcollct 3 Matthias Troyer 1 




Fig. 5: Upper axis and dashed lines: Sign as a function of fit 
for the 8-site cluster with U = 2, fi = -0.3757, t = 0.25. Lower 
axis and solid lines: Sign as a function of U /t for the frustrated 
plaquette at /3t = 10, t'/t = 0.9. 



/3 = 90) [19] turned out to be the same in Hirsch-Fye as 
in our new algorithm (Fig. 5). 

Conclusion. — We have presented a continuous time 
impurity solver based on a weak-coupling expansion of the 
partition function and an auxiliary field decomposition of 
the interaction terms. The algorithm relies on fast local 
updates of auxiliary Ising spin variables, whose number 
and position are not fixed. As a continuous time solver, 
our method docs not suffer from the deficiencies of the 
Hirsch-Fye algorithm and its variants [13]. In particular, 
it does not require multiple runs for several discretizations 
of the imaginary time interval and subsequent extrapola- 
tions. Moreover, it requires fewer auxiliary field variables 
than a standard Hirsch-Fye calculation. This translates 
into substantial gains in computational efficiency, espe- 
cially for large clusters and at low temperature. 

For all regions of parameter space considered the sign 
problem is approximately the same in the weak coupling, 
Hirsch-Fye and CT-AUX algorithms. Further investiga- 
tion is however needed to determine if this is the case in 
all regions of parameter space and for all cluster geome- 
tries. We expect the new solver to be particularly useful in 
the simulation of large clusters, and to completely replace 
the Hirsch-Fye algorithm. 

* * * 

The calculations have been performed on the Hrcidar 
bcowulf cluster at ETH Zurich, using the ALPS-library, 
[20] and our cluster simulations employed a DCA self- 
consistency loop implemented by S. Fuchs. PW acknowl- 
edges support from NSF-DMR-040135. 
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